Using the data science lifecycle of plan, data, model and deployment, I want to provide insights into many aspects of data science, programming in R and the world of the online game Guild Wars 2.
Data Science Lifecycle
Malcolm Forbes once said, “Money isn’t everything as long as you have enough of it”. Sure, money doesn’t buy happiness, but it can buy a lot of things. This is not only true for the real world, but especially in online games, where micro transactions and payed services have become the new normal.
More and more games give you the choice to invest a lot of time or a lot of money. Both ways you’ll somehow achieve the goal of the game faster. In some games you can even exchange digital goods for real money.
To get to the point, having digital currency in games can save you time, stress and even real money.
There are multiple ways to earn money in games. Often you’ll get on with just playing the game and getting some money from enemies, but for buying the good stuff, one needs a lot more money.
One way may be farming, which means hunting special items in high amounts and selling them to other players. Here’s the question, which items are worth collecting and can be sold for which prices?
Another way is buying items from other players and reselling them with a higher price tag, just like in the real world. Many games have auction houses where those transactions can be made. For this type of income one needs to know what to buy when, which prices are low or high, and when to sell for which amount.
For this project my example of game will be Guild Wars 2. Guild Wars 2 was published in 2012 by NCSoft. There are no monthly fees and the basic game is free to play since 2015. It is a massively multiplayer online role-playing game, meaning a lot of people playing in parallel in an online world. There are over 20.000 items that can be collected and many of them can be sold and bought at the auction house, called the trading post.
This is a picture of the auction house. There are many different types of items (Armor, Weapons, …)
When selling items, 5% of the cost is a fee for the auction house and gets immediately taken from your wallet. Each time an item is placed in the auction house, this 5% fee must be payed, so it’s best if the items get sold on the first run. When buying items, 10% of the price goes to the bank before the rest is delivered to the seller.
Because of that it’s important to find items where the buy-to-sell price ratio is the highest. In addition, as we buy and sell, we have to pay the whole 15% taxes which lowers our revenue.
Buying and selling items this way is called
flipping items in Guild Wars speech.
This is how it works:
Overall, we invested 1.05 gold and earned
1.8 gold. This is a revenue of 0.75 gold.
We always want to be the highest order and lowest offer, because if someone sells/buys blindly, the game will automatically take the best price for the player.
To make one thing clear, there are already a lot of web pages offering this or similar kind of services like GW2Efficiency or GW2TP on a much higher level. This should only be a fun project exploring the R language and data science in general.
For making the most out of our data, we can try to identify some important variables.
prices like average, mean and quantiles.difference between buy and sell
prices?outliers that are must-buys?quantity of items on the market?times when people buy/sell?return of investment (ROI)?Currently I’m really inexperienced with the auction market. I just buy the items I need.
The goal is to earn at least 18 gold and 6 silver just
by buying/selling items that the exploration/model rates as highest ROI.
This is the current values of 1$.
The data we need can be fetched from a REST API at https://api.guildwars2.com. There is also a good documentation on how to use the API.
We’ll use three endpoint here: 1. Get all items there are in the game
For each item:
At the end we can merge all this data to get a good overview of which items are sold/bought at the auction house. When joining all this together, we get a huge dataset of:
30.000 items500.000 buy auctions (orders)4.300.000 sell auctions (offers).There are also APIs where authorization is needed to get information about one’s user profile, but we won’t need this data for this analysis.
The code can be found in the file scrape.Rmd.
One important thing to notice: The api can be called with 1 - 200 items (IDs as query parameters). It is much more performant to call the API in batches of 200 items.
Load the data from CSV. We don’t want to get that amount of data from the API again and again.
dir <- getwd()
date <- params$data_date
price_list_buys <- read.csv(paste(dir, "/gw2-all-buys-raw-", date, ".csv", sep = ""))
price_list_sells <- read.csv(paste(dir, "/gw2-all-sells-raw-", date, ".csv", sep = ""))
item_list <- read.csv(paste(dir, "/gw2-all-items-raw-", date, ".csv", sep = ""))
df_buys <- item_list %>% left_join(price_list_buys)
df_sells <- item_list %>% left_join(price_list_sells)
rm(dir)
rm(date)
rm(item_list)
rm(price_list_sells)
rm(price_list_buys)
To get a quick impressions of the data, let’s have a look at the rough numbers.
This is how the buyer data looks like:
glimpse(df_sells)
## Rows: 3,855,802
## Columns: 12
## $ name <chr> "Sealed Package of Snowballs", "Sealed Package of Sn~
## $ type <chr> "Consumable", "Consumable", "Consumable", "Consumabl~
## $ rarity <chr> "Basic", "Basic", "Basic", "Basic", "Basic", "Basic"~
## $ vendor_value <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ id <int> 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, ~
## $ icon <chr> "https://render.guildwars2.com/file/1D05D1EE04E16E69~
## $ level <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ item_type <chr> "Generic", "Generic", "Generic", "Generic", "Generic~
## $ item_weight_class <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
## $ listings <int> 1, 1, 2, 3, 6, 9, 4, 4, 5, 2, 3, 6, 5, 11, 41, 5, 8,~
## $ unit_price <int> 275, 360, 372, 373, 374, 375, 376, 377, 378, 379, 38~
## $ quantity <int> 17, 55, 18, 137, 9, 263, 4, 8, 7, 5, 5, 365, 20, 386~
The seller data has the same structure, just for all the offers:
glimpse(df_buys)
## Rows: 437,289
## Columns: 12
## $ name <chr> "Sealed Package of Snowballs", "Sealed Package of Sn~
## $ type <chr> "Consumable", "Consumable", "Consumable", "Consumabl~
## $ rarity <chr> "Basic", "Basic", "Basic", "Basic", "Basic", "Basic"~
## $ vendor_value <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ id <int> 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, ~
## $ icon <chr> "https://render.guildwars2.com/file/1D05D1EE04E16E69~
## $ level <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ item_type <chr> "Generic", "Generic", "Generic", "Generic", "Generic~
## $ item_weight_class <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
## $ listings <int> 1, 1, 4, 7, 5, 1, 3, 10, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
## $ unit_price <int> 85, 81, 80, 77, 76, 75, 74, 72, 68, 67, 66, 65, 64, ~
## $ quantity <int> 169, 62, 1000, 1578, 1024, 250, 630, 2365, 250, 250,~
armor,
consumable, crafting material and so onListings, Unit_Price and Quantity can be read together.
E.g. for the first data entry:
listings) playerquantity) pieces of this itemunit_price) copper each.We’ll learn more about copper shortly.
Compare total items of all buys and sells:
nrow(df_sells)
## [1] 3855802
nrow(df_buys)
## [1] 437289
Compare unique items of buys and sells:
df_sells %>%
select(name) %>%
drop_na() %>%
unique() %>%
count() %>%
first()
## [1] 18931
df_buys %>%
select(name) %>%
drop_na() %>%
unique() %>%
count() %>%
first()
## [1] 18931
Let’s have a closer look at the data if we need to clean anything.
Because we get data from Guild Wars API, there is not much data cleaning needed.
We’ll just create some more fields for convenience and change some data types.
Coins are the basic currency
Just like in the real world, where 100 cents are 1 Euro, the main currency in Guild Wars, coins, are separated into three units.
There is copper, silver and gold. All those values are the same and are automatically calculated by the game.
10.000 copper == 100 silver == 1 gold.
So we need to divide all values by 10.000 to display the
price in gold, which is good for expensive goods and when we are
searching for the upper prices.
There may be occasions where it makes more sense to look for silver or even copper, but let’s introduce the price as gold first.
df_sells <- df_sells %>%
mutate(unit_price_gold = unit_price / 10000,
unit_price_silver = unit_price / 100,
unit_price_copper = unit_price) %>%
select(-unit_price)
df_buys <- df_buys %>%
mutate(unit_price_gold = unit_price / 10000,
unit_price_silver = unit_price / 100,
unit_price_copper = unit_price) %>%
select(-unit_price)
In the dataset, there are listings and quantities. Each row is one or more player offering or ordering one type of item for one specific price. We don’t have to care much about listings, as it’s only a subcategory of quantity, so let’s get rid of it.
df_sells <- df_sells %>%
select(-listings)
df_buys <- df_buys %>%
select(-listings)
Those are categorical values with a static set and can therefore be treated as factors.
df_sells <- df_sells %>%
mutate(rarity = as.factor(rarity),
type = as.factor(type),
item_type = as.factor(item_type),
item_weight_class = as.factor(item_weight_class))
df_buys <- df_buys %>%
mutate(rarity = as.factor(rarity),
type = as.factor(type),
item_type = as.factor(item_type),
item_weight_class = as.factor(item_weight_class))
This will make it easier to use their distinct values
df_types <- data.frame(types = levels(df_sells$type))
df_types %>%
gt(rowname_col = "name") %>%
tab_header(title = "Overview of existing types") %>%
cols_label(
types = ""
)
| Overview of existing types |
|---|
| Armor |
| Back |
| Bag |
| Consumable |
| Container |
| CraftingMaterial |
| Gizmo |
| MiniPet |
| Quux |
| Qux |
| Tool |
| Trinket |
| Trophy |
| UpgradeComponent |
| Weapon |
rm(df_types)
df_rarity <- data.frame(rarity = levels(df_sells$rarity))
df_rarity %>%
gt(rowname_col = "name") %>%
tab_header(title = "Overview of existing rarities") %>%
cols_label(
rarity = ""
)
| Overview of existing rarities |
|---|
| Ascended |
| Basic |
| Exotic |
| Fine |
| Legendary |
| Masterwork |
| Rare |
rm(df_rarity)
To make the data split reproducible, we set a seed.
TODO: Define y_label, do this for buys and sells?
set.seed(42)
# Put 3/4 of the data into the training set
#data_split <- initial_split(df_buys,
# prop = 3/4,
# strata = y_label,
# breaks = 4)
# Create dataframes for the two sets:
#train_data <- training(data_split)
#test_data <- testing(data_split)
Create validation set (used during modeling)
#cv_folds <-
# vfold_cv(train_data,
# v=5,
# strata = y_label,
# breaks = 4)
#df_train <- train_data
Now work with the train data.
df_types <- df_sells %>%
distinct(type, name) %>%
group_by(type) %>%
count(type)
df_types %>%
ggplot() +
geom_bar(aes(x = type, y = n, fill = type), stat='identity') +
theme(axis.text.x = element_text(angle = 45, hjust=1), legend.position="bottom") +
labs(title = "Type distribution", subtitle = "of all types", fill = "Types",
x = "Type", y = "Count", caption = paste("Data from", params$data_date))
There are a lot of different types of items. As we can see, Armor,
Consumables and Weapons are the most common items.
This means there are very many different weapon, consumable and armor items in the game, but not that many different crafting materials and upgrade components which seems reasonable.
Armors and Weapons can be further broken down into sub types:
df_item_types <- df_sells %>%
filter(type %in% c("Armor", "Weapon")) %>%
distinct(type, item_type, item_weight_class, name) %>%
group_by(type, item_type, item_weight_class) %>%
count(item_type)
df_item_types %>%
ggplot() +
geom_bar(aes(x = item_type, y = n, fill = item_weight_class), stat='identity') +
facet_grid(df_item_types$type) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.3, hjust=1)) +
labs(title = "Detailed item types", subtitle = "Split by armors and weapons", fill = "Item weight",
x = "Item types", y = "Count", caption = paste("Data from", params$data_date))
rm(df_item_types)
For armors we have the distinction between Light (for mages), Medium (for rangers) and Heavy (for warriors). Clothing are special items for the style points. There is no such distinction for weapons. Different classes can use different weapons, but all of them can be used by more than one class.
Let’s have a closer look at one of the types, Tinket.
df_sells %>%
filter(type == "Trinket") %>%
group_by(type, rarity) %>%
summarise(name = unique(name),
mean = mean(unit_price_gold),
icon = min(web_image(icon, height = 50))
) %>%
drop_na() %>%
arrange(desc(mean)) %>%
gt(rowname_col = "name") %>%
tab_header(title = "Trinkets", subtitle = "Items with special means") %>%
fmt_number(
columns = mean,
suffixing = "G"
) %>%
fmt_markdown(
columns = icon
) %>%
summary_rows(
columns = mean,
fns = list(TOTAL = "sum"),
formatter = fmt_number
) %>%
tab_footnote(
footnote = "Prices in gold",
locations = cells_column_labels(columns = mean)
) %>%
tab_source_note(
"Based on data from api.guildwars2.com"
) %>%
tab_options(
summary_row.background.color = "#ACEACE",
row_group.background.color = "#FFEFDB",
table.layout = "auto",
container.overflow.x = TRUE,
container.height = px(350)
)
| Trinkets | ||
|---|---|---|
| Items with special means | ||
| mean1 | icon | |
| Trinket - Basic | ||
| White Coral Ring | 5.13 | |
| Trinket - Fine | ||
| Turquoise Copper Amulet | 4.23 | |
| Tiger's Eye Copper Stud | 4.23 | |
| Tiger's Eye Copper Ring | 4.23 | |
| Turquoise Copper Ring | 4.23 | |
| Malachite Copper Ring | 4.23 | |
| Garnet Copper Amulet | 4.23 | |
| Garnet Copper Stud | 4.23 | |
| Garnet Copper Ring | 4.23 | |
| Turquoise Copper Stud | 4.23 | |
| Malachite Copper Stud | 4.23 | |
| Carnelian Silver Ring | 4.23 | |
| Peridot Silver Ring | 4.23 | |
| Carnelian Silver Amulet | 4.23 | |
| Peridot Silver Amulet | 4.23 | |
| Carnelian Silver Stud | 4.23 | |
| Peridot Silver Earring | 4.23 | |
| Lapis Silver Ring | 4.23 | |
| Lapis Silver Amulet | 4.23 | |
| Lapis Silver Earring | 4.23 | |
| Peridot Gold Ring | 4.23 | |
| Topaz Gold Ring | 4.23 | |
| Peridot Gold Amulet | 4.23 | |
| Topaz Gold Amulet | 4.23 | |
| Peridot Gold Earring | 4.23 | |
| Topaz Gold Earring | 4.23 | |
| Carnelian Gold Band | 4.23 | |
| Lapis Gold Ring | 4.23 | |
| Amethyst Gold Ring | 4.23 | |
| Carnelian Gold Pendant | 4.23 | |
| Lapis Gold Amulet | 4.23 | |
| Amethyst Gold Amulet | 4.23 | |
| Carnelian Gold Earring | 4.23 | |
| Lapis Gold Earring | 4.23 | |
| Amethyst Gold Earring | 4.23 | |
| Amethyst Silver Band | 4.23 | |
| Pearl Copper Ring | 4.23 | |
| Pearl Copper Amulet | 4.23 | |
| Amethyst Silver Pendant | 4.23 | |
| Pearl Copper Stud | 4.23 | |
| Amethyst Silver Earring | 4.23 | |
| Tiger's Eye Copper Amulet | 4.23 | |
| Malachite Copper Amulet | 4.23 | |
| Amber Copper Amulet | 4.23 | |
| Amber Copper Stud | 4.23 | |
| Amber Copper Ring | 4.23 | |
| Topaz Silver Band | 4.23 | |
| Topaz Silver Earring | 4.23 | |
| Topaz Silver Pendant | 4.23 | |
| Beryl Platinum Ring | 4.23 | |
| Chrysocola Platinum Ring | 4.23 | |
| Emerald Platinum Ring | 4.23 | |
| Coral Platinum Ring | 4.23 | |
| Ruby Platinum Ring | 4.23 | |
| Sapphire Platinum Ring | 4.23 | |
| Ruby Mithril Ring | 4.23 | |
| Sapphire Mithril Ring | 4.23 | |
| Beryl Mithril Ring | 4.23 | |
| Chrysocola Mithril Ring | 4.23 | |
| Emerald Mithril Ring | 4.23 | |
| Coral Mithril Ring | 4.23 | |
| Beryl Platinum Earring | 4.23 | |
| Chrysocola Platinum Earring | 4.23 | |
| Emerald Platinum Earring | 4.23 | |
| Coral Platinum Earring | 4.23 | |
| Ruby Platinum Earring | 4.23 | |
| Sapphire Platinum Earring | 4.23 | |
| Chrysocola Mithril Earring | 4.23 | |
| Emerald Mithril Earring | 4.23 | |
| Beryl Mithril Earring | 4.23 | |
| Sapphire Mithril Earring | 4.23 | |
| Ruby Mithril Earring | 4.23 | |
| Coral Mithril Earring | 4.23 | |
| Beryl Platinum Amulet | 4.23 | |
| Chrysocola Platinum Amulet | 4.23 | |
| Emerald Platinum Amulet | 4.23 | |
| Coral Platinum Amulet | 4.23 | |
| Ruby Platinum Amulet | 4.23 | |
| Sapphire Platinum Amulet | 4.23 | |
| Ruby Mithril Amulet | 4.23 | |
| Sapphire Mithril Amulet | 4.23 | |
| Beryl Mithril Amulet | 4.23 | |
| Chrysocola Mithril Amulet | 4.23 | |
| Emerald Mithril Amulet | 4.23 | |
| Coral Mithril Amulet | 4.23 | |
| Spinel Silver Ring | 4.23 | |
| Spinel Gold Ring | 4.23 | |
| Spinel Silver Earring | 4.23 | |
| Spinel Gold Earring | 4.23 | |
| Spinel Silver Amulet | 4.23 | |
| Spinel Gold Amulet | 4.23 | |
| Commissar's Manifesto | 4.23 | |
| Breath of Kralkatorrik | 4.23 | |
| Symbol of the Inquest | 4.23 | |
| Rotbeard's Treasure | 4.23 | |
| Pendant of Arah | 4.23 | |
| All Seeing | 4.23 | |
| Star of Dwayna | 4.23 | |
| Sentinel's Bane | 4.23 | |
| Circle of Arah | 4.23 | |
| Balthazar's Band | 4.23 | |
| Dwayna's Embrace | 4.23 | |
| Melandru's Bloom | 4.23 | |
| Signet of Grenth | 4.23 | |
| Eternal Ice | 4.23 | |
| Sam | 4.23 | |
| Ulgoth's Tail | 4.23 | |
| Rhendak's Signet | 4.23 | |
| Blue Coral Ring | 4.23 | |
| Passiflora Mithril Ring | 4.23 | |
| Passiflora Mithril Earring | 4.23 | |
| Passiflora Mithril Amulet | 4.23 | |
| Azurite Mithril Amulet | 4.23 | |
| Azurite Mithril Earring | 4.23 | |
| Azurite Mithril Ring | 4.23 | |
| Sunstone Silver Band | 4.23 | |
| Sunstone Gold Ring | 4.23 | |
| Opal Platinum Ring | 4.23 | |
| Opal Mithril Ring | 4.23 | |
| Sunstone Silver Earring | 4.23 | |
| Sunstone Gold Earring | 4.23 | |
| Opal Platinum Earring | 4.23 | |
| Opal Mithril Earring | 4.23 | |
| Sunstone Silver Pendant | 4.23 | |
| Sunstone Gold Amulet | 4.23 | |
| Opal Platinum Amulet | 4.23 | |
| Opal Mithril Amulet | 4.23 | |
| Trinket - Masterwork | ||
| Garnet Copper Ring | 2.90 | |
| Turquoise Copper Ring | 2.90 | |
| Malachite Copper Ring | 2.90 | |
| Tiger's Eye Copper Ring | 2.90 | |
| Amethyst Silver Band | 2.90 | |
| Garnet Copper Amulet | 2.90 | |
| Tiger's Eye Copper Amulet | 2.90 | |
| Turquoise Copper Stud | 2.90 | |
| Garnet Copper Stud | 2.90 | |
| Tiger's Eye Copper Stud | 2.90 | |
| Malachite Copper Stud | 2.90 | |
| Amethyst Silver Earring | 2.90 | |
| Malachite Copper Amulet | 2.90 | |
| Amethyst Silver Pendant | 2.90 | |
| Turquoise Copper Amulet | 2.90 | |
| Ring | 2.90 | |
| Earring | 2.90 | |
| Carnelian Silver Ring | 2.90 | |
| Carnelian Silver Amulet | 2.90 | |
| Carnelian Silver Stud | 2.90 | |
| Peridot Silver Ring | 2.90 | |
| Peridot Silver Pendant | 2.90 | |
| Peridot Silver Earring | 2.90 | |
| Carnelian Gold Band | 2.90 | |
| Amethyst Gold Ring | 2.90 | |
| Lapis Gold Ring | 2.90 | |
| Carnelian Gold Pendant | 2.90 | |
| Amethyst Gold Amulet | 2.90 | |
| Lapis Gold Amulet | 2.90 | |
| Carnelian Gold Earring | 2.90 | |
| Amethyst Gold Earring | 2.90 | |
| Lapis Gold Earring | 2.90 | |
| Peridot Gold Ring | 2.90 | |
| Topaz Gold Ring | 2.90 | |
| Peridot Gold Amulet | 2.90 | |
| Topaz Gold Amulet | 2.90 | |
| Peridot Gold Earring | 2.90 | |
| Topaz Gold Earring | 2.90 | |
| Iron Ring | 2.90 | |
| Pearl Copper Amulet | 2.90 | |
| Amber Copper Amulet | 2.90 | |
| Amber Copper Stud | 2.90 | |
| Pearl Copper Stud | 2.90 | |
| Pearl Copper Ring | 2.90 | |
| Amber Copper Ring | 2.90 | |
| Lapis Silver Ring | 2.90 | |
| Topaz Silver Band | 2.90 | |
| Lapis Silver Earring | 2.90 | |
| Topaz Silver Earring | 2.90 | |
| Lapis Silver Amulet | 2.90 | |
| Topaz Silver Pendant | 2.90 | |
| Ruby Platinum Ring | 2.90 | |
| Sapphire Platinum Ring | 2.90 | |
| Beryl Platinum Ring | 2.90 | |
| Chrysocola Platinum Ring | 2.90 | |
| Emerald Platinum Ring | 2.90 | |
| Coral Platinum Ring | 2.90 | |
| Chrysocola Mithril Ring | 2.90 | |
| Emerald Mithril Ring | 2.90 | |
| Beryl Mithril Ring | 2.90 | |
| Sapphire Mithril Ring | 2.90 | |
| Ruby Mithril Ring | 2.90 | |
| Coral Mithril Ring | 2.90 | |
| Ruby Platinum Earring | 2.90 | |
| Sapphire Platinum Earring | 2.90 | |
| Beryl Platinum Earring | 2.90 | |
| Chrysocola Platinum Earring | 2.90 | |
| Emerald Platinum Earring | 2.90 | |
| Coral Platinum Earring | 2.90 | |
| Chrysocola Mithril Earring | 2.90 | |
| Sapphire Mithril Earring | 2.90 | |
| Ruby Mithril Earring | 2.90 | |
| Coral Mithril Earring | 2.90 | |
| Emerald Mithril Earring | 2.90 | |
| Beryl Mithril Earring | 2.90 | |
| Ruby Platinum Amulet | 2.90 | |
| Sapphire Platinum Amulet | 2.90 | |
| Beryl Platinum Amulet | 2.90 | |
| Chrysocola Platinum Amulet | 2.90 | |
| Emerald Platinum Amulet | 2.90 | |
| Coral Platinum Amulet | 2.90 | |
| Chrysocola Mithril Amulet | 2.90 | |
| Emerald Mithril Amulet | 2.90 | |
| Beryl Mithril Amulet | 2.90 | |
| Sapphire Mithril Amulet | 2.90 | |
| Ruby Mithril Amulet | 2.90 | |
| Coral Mithril Amulet | 2.90 | |
| Spinel Silver Ring | 2.90 | |
| Spinel Gold Ring | 2.90 | |
| Spinel Silver Earring | 2.90 | |
| Spinel Gold Earring | 2.90 | |
| Spinel Silver Amulet | 2.90 | |
| Spinel Gold Amulet | 2.90 | |
| Purple Coral Ring | 2.90 | |
| Mighty Amulet | 2.90 | |
| Strong Amulet | 2.90 | |
| Berserker's Amulet | 2.90 | |
| Precise Amulet | 2.90 | |
| Penetrating Amulet | 2.90 | |
| Carrion Amulet | 2.90 | |
| Mighty Ring | 2.90 | |
| Strong Ring | 2.90 | |
| Penetrating Ring | 2.90 | |
| Precise Ring | 2.90 | |
| Berserker's Ring | 2.90 | |
| Carrion Ring | 2.90 | |
| Fertilizer Pouch | 2.90 | |
| Ring of the Catacombs | 2.90 | |
| Foefire Amulet | 2.90 | |
| Charged Band | 2.90 | |
| Carving of Jormag | 2.90 | |
| Svanir Talisman | 2.90 | |
| Orrian Artifact Shards | 2.90 | |
| Amulet of the Dead | 2.90 | |
| Dissonance Amplifier | 2.90 | |
| Revolutionary Flyers | 2.90 | |
| Band of the Moliarchy | 2.90 | |
| Beetlestone's Band | 2.90 | |
| Darkened Vine | 2.90 | |
| Twisted Tendril | 2.90 | |
| Flame Legion Ritual Book | 2.90 | |
| Wildfire Talisman | 2.90 | |
| Passiflora Mithril Ring | 2.90 | |
| Passiflora Mithril Earring | 2.90 | |
| Passiflora Mithril Amulet | 2.90 | |
| Cavalier's Ring | 2.90 | |
| Magi's Ring | 2.90 | |
| Rabid Ring | 2.90 | |
| Soldier's Ring | 2.90 | |
| Cavalier's Amulet | 2.90 | |
| Magi's Amulet | 2.90 | |
| Rabid Amulet | 2.90 | |
| Soldier's Amulet | 2.90 | |
| Cavalier's Field Guide | 2.90 | |
| Soldier's Field Guide | 2.90 | |
| Book of Rabid Deeds | 2.90 | |
| Magi's Field Guide | 2.90 | |
| Snowflake Copper Amulet | 2.90 | |
| Snowflake Copper Earring | 2.90 | |
| Snowflake Copper Ring | 2.90 | |
| Azurite Mithril Amulet | 2.90 | |
| Azurite Mithril Earring | 2.90 | |
| Azurite Mithril Ring | 2.90 | |
| Candy Corn Silver Pendant | 2.90 | |
| Sunstone Silver Band | 2.90 | |
| Sunstone Gold Ring | 2.90 | |
| Opal Platinum Ring | 2.90 | |
| Opal Mithril Ring | 2.90 | |
| Sunstone Silver Earring | 2.90 | |
| Sunstone Gold Earring | 2.90 | |
| Opal Platinum Earring | 2.90 | |
| Opal Mithril Earring | 2.90 | |
| Sunstone Silver Pendant | 2.90 | |
| Sunstone Gold Amulet | 2.90 | |
| Opal Platinum Amulet | 2.90 | |
| Opal Mithril Amulet | 2.90 | |
| Caudecus's Journal | 2.90 | |
| Coral Pendant | 2.90 | |
| Arctic Ring | 2.90 | |
| Diffraction Crystal | 2.90 | |
| Jar of Lost Souls | 2.90 | |
| TOTAL | 1,005.89 | — |
| Based on data from api.guildwars2.com | ||
| 1 Prices in gold | ||
There are different rarities in Guild Wars that suggest how often this item will occur in the game:
rarities <- c("Junk", "Basic", "Fine", "Masterwork", "Rare", "Exotic", "Ascended", "Legendary")
colors <-
c("Junk" = "#a3aeb9",
"Basic" = "#000000",
"Fine" = "#6a9cd9",
"Masterwork" = "#17911a",
"Rare" = "#f0cd25",
"Exotic" = "#e9ad1a",
"Ascended" = "#e64680",
"Legendary" = "#53238a")
data.frame(rarities) %>%
ggplot() +
geom_bar(aes(x = rarities, fill = rarities)) +
scale_x_discrete(limits = rarities) +
scale_fill_manual("Rarity", values = colors) +
theme(axis.text.y=element_blank(),
axis.title.y=element_blank(),
axis.title.x=element_blank())
What is obvious by the names, rarity on the left is in general more common than rarity on the right.
One important information is that a lot of very rare items
(Ascended and Legendary) can’t be sold to
other players. That’s the reason we see much more Rare and
Exotic items being sold for high prices than those really
rare ones. Junk is zero because it can only be sold at non-player
vendors.
In our dataset, we have the following distribution:
df <- df_sells %>%
count(rarity) %>%
arrange(desc(n))
df %>%
ggplot() +
geom_bar(aes(x = rarity, y = n, fill = rarity), stat='identity') +
scale_x_discrete(limits = rarities) +
scale_fill_manual("Rarity", values = colors) +
labs(title = "Rarity distribution", subtitle = "of all raritires", fill = "Rarity",
x = "Rarity", y = "Count", caption = paste("Data from", params$data_date))
relevant_rarities <- df %>%
filter(n > 100000)
relevant_rarities <- relevant_rarities$rarity
On the auction house, one can create an offer (to sell items) or an order (to buy items).
If an offer is created, the game automatically searches for the highest order price and if that’s fine for the seller, the item will be sold and sent. But the seller can also create an offer for a specific price (higher than the other orders) and wait, until someone wants to buy the item for the wanted amount.
The same is true for orders. Items can be bought immediately from the lowest offer or an order is created (lower than the other offers) and will wait, until someone sells the item for the wanted amount.
Let’s make an example:
df_top5 <- df_sells %>%
group_by(name) %>%
summarise(mean_unit_price_gold = mean(unit_price_gold),
mean_vendor_value_gold = mean(vendor_value) / 10000,
type = unique(type),
rarity = unique(rarity)) %>%
arrange(desc(mean_unit_price_gold)) %>%
ungroup() %>%
top_n(5, mean_unit_price_gold)
df_top5 %>%
ggplot() +
geom_bar(aes(x = name, y = mean_unit_price_gold, fill = rarity), stat="identity") +
geom_point(aes(x = name, y = mean_vendor_value_gold), fill = "#000000", stat="identity") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
scale_x_discrete(limits = df_top5$name) +
labs(x = "Name", y = "Mean price",
title = "Five most expensive items", subtitle = "Items being sold (offers), in gold",
caption = paste("Data from", params$data_date))
highest_offer <- df_top5 %>%
select(mean_unit_price_gold) %>%
first() %>%
first()
rm(df_top5)
Our top item here, the Festive Grymm Svaard, is a
cosmetic item that can only be bought with real money and a lot of luck.
It costs 10^{4} gold which seems to be the maximum possible price for an
auction. This is only an offer, someone wants to sell
this item for this price, so we’ll have to find out if anyone would ever
buy this.
df_buys_top5_grymm <- df_buys %>%
filter(name == 'Festive Grymm Svaard') %>%
select(name, unit_price_gold) %>%
top_n(5) %>%
arrange(desc((unit_price_gold)))
df_buys_top5_grymm %>%
gt(rowname_col = "name") %>%
tab_header(title = "Orders of the Festive Grymm Svaard") %>%
fmt_number(
columns = unit_price_gold
) %>%
cols_label(
unit_price_gold = "Unit price (in gold)"
)
| Orders of the Festive Grymm Svaard | |
|---|---|
| Unit price (in gold) | |
| Festive Grymm Svaard | 300.70 |
| Festive Grymm Svaard | 300.70 |
| Festive Grymm Svaard | 276.47 |
| Festive Grymm Svaard | 76.47 |
| Festive Grymm Svaard | 76.47 |
highest_order <- df_buys_top5_grymm %>%
select(unit_price_gold) %>%
first() %>%
first()
rm(df_buys_top5_grymm)
As we can see, the highest order on this item is 300.6983 gold. That’s pretty much, but far away from 10^{4}.
If someone is able to buy this item for 300.6983 gold and sell it again for 10^{4} that would be a good fortune.
Because the prices are so different, it’s unlikely that this will happen.
But who doesn’t want to look that fancy.
The Festive Grymm Svaard skin.
rm(highest_order)
rm(highest_offer)
Now let’s not only look at the top five items, but at the distribution in general.
df_sells_distribution <- df_sells %>%
group_by(name) %>%
summarise(mean_unit_price_gold = mean(unit_price_gold),
type = unique(type),
rarity = unique(rarity)) %>%
arrange(desc(mean_unit_price_gold))
## `summarise()` has grouped output by 'name'. You can override using the
## `.groups` argument.
df_sells_distribution %>%
ggplot() +
geom_histogram(aes(x = mean_unit_price_gold, fill = type), stat="count") +
scale_x_binned(limits = c(0, 1000)) +
labs(x = "Mean price", y = "Count",
title = "Item price distribution", subtitle = "Items being sold (offers), in gold",
caption = paste("Data from", params$data_date))
## Warning: Ignoring unknown parameters: binwidth, bins, pad
## Warning: Removed 423 rows containing non-finite values (stat_count).
As we can see, prices are common between 0 - 100 gold. What we also see is that there are a lot of different types of items. We can strip this down so we only see the relevant types.
relevant_types <- df_sells_distribution %>%
group_by(type) %>%
count() %>%
filter(n > 350)
df_sells_distribution <- df_sells_distribution %>%
filter(type %in% C(relevant_types$type))
df_sells_distribution %>%
ggplot() +
geom_histogram(aes(x = mean_unit_price_gold, fill = type), stat="count") +
scale_x_binned(limits = c(0, 50), breaks = c(0, 10, 25, 50)) +
facet_grid(. ~ type) +
labs(x = "Mean price", y = "Count",
title = "Item price distribution", subtitle = "Items being sold (offers), in gold",
caption = paste("Data from", params$data_date))
## Warning: Ignoring unknown parameters: binwidth, bins, pad
## Warning: Removed 378 rows containing non-finite values (stat_count).
For our goal to buy low and sell high, we need to know which items
have the best profit. As explained before, because of taxes, profit
means 85% of the sell price - buy price.
We first need to organize our data to calculate the difference between buys and sells:
df_all <- df_buys %>%
select(id, name, unit_price_gold, quantity) %>%
left_join(df_sells %>%
select(id, unit_price_gold, quantity),
by = "id", suffix = c("_buys", "_sells"))
Now we can calculate the profit:
df_all <- df_all %>%
mutate(unit_price_gold_diff = unit_price_gold_sells - unit_price_gold_buys,
profit = 0.85 * unit_price_gold_sells - unit_price_gold_buys,
quantity_diff = quantity_sells - quantity_buys) %>%
filter(profit > 0) %>%
arrange(unit_price_gold_diff) %>%
filter(quantity_diff > 0)
Let’s have a look at the top 100 items with the highest profit. As we
can see, even the top 100 are very special items that aren’t often
traded (quantity < 10).
This is interesting, but it doesn’t seem to be a good fit for real trading.
df_top_profit <- df_all %>%
group_by(name) %>%
summarise(profit = round(min(profit), digits = 2),
id = unique(id),
quantity = min(quantity_diff)) %>%
ungroup() %>%
arrange(desc(profit)) %>%
head(100)
df_top_profit <- df_top_profit %>%
left_join(df_sells %>% select(id, rarity, level, icon), by = "id") %>%
mutate(icon = web_image(icon, height = 25)) %>%
select(id, name, profit, rarity, level, quantity, icon) %>%
distinct(id, name, profit, rarity, level, quantity, icon)
df_top_profit %>%
datatable(extensions = c('ColReorder', 'Buttons', 'Responsive', 'Scroller', 'SearchPanes', 'Select'),
options = list(colReorder = TRUE,
dom = 'Bfrtip',
buttons = list('searchPanes', 'copy', list(
extend = 'collection',
buttons = c('csv', 'excel', 'pdf'),
text = 'Download'))),
escape = FALSE,
colnames=c("#", "Name", "Profit", "Rarity", "Level", "Quantity", "Icon")
)
Starting with a low budget, we can’t buy a lot of high-priced items. Therefore it’s good to know which items get the most profit compared to their costs. We always suppose the worst profit.
df_high_roi <- df_all %>%
group_by(name) %>%
summarise(
id = unique(id),
profitByCost = min(profit) / max(unit_price_gold_buys),
profit = min(profit),
cost = max(unit_price_gold_buys),
quantity = min(quantity_buys)
) %>%
ungroup() %>%
arrange(desc(profitByCost))
## `summarise()` has grouped output by 'name'. You can override using the
## `.groups` argument.
df_high_roi %>%
ggplot(aes(x = cost, y = profit)) +
geom_point() +
stat_summary(fun.data=mean_cl_normal) +
geom_smooth(method='lm', formula= y~x) +
labs(x = "Cost", y = "Profit",
title = "Cost and Profit of Items", subtitle = "Prices in gold",
caption = paste("Data from", params$data_date))
## Warning: Computation failed in `stat_summary()`:
As we can see, there are some outliers with high costs and low profit,
but also some high cost high profit ones.
df_high_roi %>%
mutate(
profitByCost = round(profitByCost, digits = 2),
profit = round(profit, digits = 2),
cost = round(cost, digits = 2)
) %>%
filter(profit > 0) %>%
select(name, profitByCost, profit, cost, quantity) %>%
datatable(extensions = c('ColReorder', 'Buttons', 'Responsive', 'Scroller', 'SearchPanes', 'Select'),
options = list(colReorder = TRUE,
dom = 'Bfrtip',
buttons = list('searchPanes', 'copy', list(
extend = 'collection',
buttons = c('csv', 'excel', 'pdf'),
text = 'Download'))),
escape = FALSE,
colnames=c("Name", "Profit by cost", "Profit", "Cost", "Quantity")
)
## Warning in instance$preRenderHook(instance): It seems your data is too big
## for client-side DataTables. You may consider server-side processing: https://
## rstudio.github.io/DT/server.html
Great, we found items with very high profit. Let’s buy some of them that also have a high quantity.
Order_1
To get a better overview how the item cost and profits relate to each other, we’ll do a quick cluster analysis.
(taken from https://www.kirenz.com/post/2020-05-21-r-hierarchische-clusteranalyse/)
df_cl <- df_high_roi %>%
select(c("id", "profit", "cost"))
df_cl$profit <- scale(df_cl$profit, center = TRUE, scale = TRUE)
df_cl$cost <- scale(df_cl$cost, center = TRUE, scale = TRUE)
df_cl %>%
ggplot(aes(cost,
profit,
label = id)) +
geom_point() +
geom_text(size = 3,
check_overlap = FALSE,
vjust = 0, nudge_y = 0.1) +
theme_classic() +
ylab("Profit") +
xlab("Cost") +
scale_x_continuous(n.breaks = 20)
Because we scaled the numbers we can’t really say something about the real value of the items, but most of them are in the lower cost / lower profit range.
d <-
df_cl %>%
select(-id) %>%
dist(method = "euclidean")
hc <- hclust(d, method = "ward.D2")
#sort(unique(cophenetic(hc)))
hc$labels <- df_cl$id
grp <- cutree(hc, k = 4)
df_cl$cluster <- grp
rm(d)
rm(hc)
rm(grp)
df_cl %>%
ggplot(aes(cost,
profit,
label = id,
color = factor(cluster))) +
geom_point() +
geom_text(size = 3,
check_overlap = FALSE,
vjust = 0, nudge_y = 0.1) +
theme_classic() +
xlab("Cost") +
ylab("Profit") +
theme(legend.title=element_blank())
We created four clusters out of the data.
It’s interesting that there are a few items in the “(2) green” cluster that are also lower on profit and would probably also fit in the “(1) red” cluster, but it seems that the “(2) green” cluster is also characterized by a higher price than the “(1) red” one.
Let’s have a closer look on one of the most important resources in the game: Crafting material.
The base material can be collected everywhere in the world from trees, stones or herbs. As a player, you can learn up to two professions that can improve those base materials to something better and useful.
Let’s first check if there are any outliers in the data that we should clean.
df_crafting_sells <- df_sells %>%
filter(type == "CraftingMaterial")
df_crafting_buys <- df_buys %>%
filter(type == "CraftingMaterial")
ggplot() +
geom_boxplot(data = df_crafting_sells, aes(x = 'Sells', y = unit_price_gold)) +
geom_boxplot(data = df_crafting_buys, aes(x = 'Buys', y = unit_price_gold)) +
geom_hline(yintercept = 999, linetype="dashed", color = "red") +
scale_y_continuous(labels = comma) +
labs(title = "Outliers on crafting material", subtitle = "Baseline at 999 gold",
x = "", y = "Price in gold", caption = paste("Data from", params$data_date))
## Warning: Removed 42 rows containing non-finite values (stat_boxplot).
We can see that on the buys, there aren’t any outliers because the
human brain is still working and no one buys overpriced
goods. On the other hand, on the sells there are some people who are
exaggerating with the prices. Let’s strip everything at 1000 and
above.
df_sells_crafting_material <- df_sells %>%
filter(type == "CraftingMaterial",
unit_price_gold < 1000) %>%
group_by(rarity, name) %>%
summarize(
mean_price = mean(unit_price_gold),
sum_quantity = sum(quantity),
picture = web_image(unique(icon), height = 25)
) %>%
mutate(
quantity_label = label_number_si(accuracy=0.1)(sum_quantity),
mean_price_gold = round(mean_price, digits = 3)) %>%
arrange(desc(sum_quantity)) %>%
select(rarity, name, mean_price_gold, quantity_label, sum_quantity, picture)
df_sells_crafting_material %>%
datatable(extensions = c('ColReorder', 'Buttons', 'Responsive', 'Scroller', 'SearchPanes', 'Select'),
options = list(colReorder = TRUE,
dom = 'Bfrtip',
buttons = list('searchPanes', 'copy', list(
extend = 'collection',
buttons = c('csv', 'excel', 'pdf'),
text = 'Download'))),
escape = FALSE,
colnames=c("#", "Rarity", "Name", "Mean price in gold", "Quantity (short)", "Quantity", "Icon")
)
exp_sell_item_name <- df_sells_crafting_material %>%
arrange(desc(mean_price_gold)) %>%
ungroup() %>%
select(name) %>%
first() %>%
first()
Looking at Crafting Material, we first see that there is an overwhelming amount of offers on the market. There are 32.9M items of Nougat Center! Nougat Center, Chattering Skull and Plastic Fangs are all Halloween Event items and therefore not hard to get but also not very useful.
The first real crafting material is Large Claw, with
still over 21.4M items.
The most expensive mean crafting material is Sentinel’s Embroidered Silk Insignia.
Let’s summarize what we’ve learned so far:
If possible, it’s always better to sell items on the action house instead of the vendor. -> really?